Goto

Collaborating Authors

 architecture parameter




SupplementaryMaterialfor "CLEARER: Multi-ScaleNeuralArchitectureSearch forImageRestoration "

Neural Information Processing Systems

Each module could be either parallel module or fusion module, which is determined by optimizing the architecture parametersαp and αf. Specifically,the learned twoarchitectures both contain eight fusion modules and four parallel modules, and the only one difference between them is the position ofthefusion andtheparallel modules. From theobservations, wecould conclude that: 1) themulti-scale information isremarkably important toimage restoration. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. From the top to the bottom for each image, the noise levels areσ = 30,50,70. From the left to the right are Input, BM3D[1],RED[9],WNNM[3],NLRN[6],DuRN-P [7],N3Net[10],CLEARER, andGround truth.







ZARTS: On Zero-order Optimization for Neural Architecture Search

Neural Information Processing Systems

Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. It introduces trainable architecture parameters to represent the importance of candidate operations and proposes first/second-order approximation to estimate their gradients, making it possible to solve NAS by gradient descent algorithm. However, our in-depth empirical results show that the approximation often distorts the loss landscape, leading to the biased objective to optimize and, in turn, inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS.